Exploring meta-data of human vaginal microbiome

Group 6

Alberte Englund
Mathilde Due
Line Winther Gormsen
Sigrid Frandsen
Kristine Johansen

Examples:

Bare for at vide de ting vi kan bruge i præsentations-dokumentet. Her skriver jeg i bold. Her skriver jeg i kursiv. Her skriver jeg i bold og kursiv. Her skriver jeg i rød

Study Description

  • Dataset:
  • Aim

Examples:

Several columns

Left column

Right column

Examples: smaller text on slide

slide so you can make things smaller if content can not fit. - Bullet Point 1

  • Bullet Point 2

  • Bullet Point 3

  • Bullet Point 4

  • Bullet Point 5

  • Bullet Point 6

Examples: scrollable slide

slide so you can scroll to see the rest of the content if to much. - Bullet Point 1

  • Bullet Point 2

  • Bullet Point 3

  • Bullet Point 4

  • Bullet Point 5

  • Bullet Point 6

  • Bullet Point 7

  • Bullet Point 8

  • Bullet Point 9

  • Bullet Point 10

  • Bullet Point 11

  • Bullet Point 12

  • Bullet Point 13

Asides and footnotes

Asides

Slide content

Footnotes

  • Green 1
  • Brown
  • Purple

Study Description

  • Meta-data from MGnify’s vaginal microbiome genome catalogue

  • Uncover patterns in genome quality, taxonomic composition, and ecological characteristics.

  • Investigate potential patterns for diagnosis of endometriosis via associated pathogens:

    • Anaerococcus, Ureaplasma, Gardnerella, Veillonella, Corynebacterium, Peptoniphilus, Candida albicans, Alloscardovia 1
  • Raw data:

Untidy -> tidy data

  1. Splitting the data in the “lineage” variable into multiple variables of the phylogenetic classes into seven taxonomic ranks.
  2. Covert all “not provided” to NA.
  3. Extract each taxonomic rank and remove prefixes.
  4. Convert empty strings to NA in the new taxonomy columns.
  5. Remove the GTDB suffixes (e.g. “_A”) to streamline taxonomies.
  6. Remove columns that will not be used in our analysis.
print(readr::read_tsv(
  here::here("/net/pupil1/home/people/s215111/Exercises_from_class/Projects/group_06_project/data/_raw/genomes-all_metadata.tsv")))
# A tibble: 618 × 20
   Genome        Genome_type  Length N_contigs    N50 GC_content Completeness
   <chr>         <chr>         <dbl>     <dbl>  <dbl>      <dbl>        <dbl>
 1 MGYG000303700 MAG          678213         2 466332       47.8         63.7
 2 MGYG000303701 MAG         1500176        18 112881       42.4         87.8
 3 MGYG000303702 MAG         1210062        44  48790       26.4         94.8
 4 MGYG000303703 MAG         1706016        27  89653       44.6         93.7
 5 MGYG000303704 MAG          703182         7 111709       47.8         63.7
 6 MGYG000303705 MAG         2542045       112  34925       48           97.9
 7 MGYG000303706 MAG         1449687       185  10153       34.8         85.2
 8 MGYG000303707 MAG         1874692        90  28768       37.1         99.0
 9 MGYG000303708 MAG         1480380        12 169949       42.2         87.6
10 MGYG000303709 MAG          694644        57  15063       47.9         62.0
# ℹ 608 more rows
# ℹ 13 more variables: Contamination <dbl>, rRNA_5S <dbl>, rRNA_16S <dbl>,
#   rRNA_23S <dbl>, tRNAs <dbl>, Genome_accession <chr>, Species_rep <chr>,
#   Lineage <chr>, Sample_accession <chr>, Study_accession <chr>,
#   Country <chr>, Continent <chr>, FTP_download <chr>
untidy_data <- readr::read_tsv(
  here::here("/net/pupil1/home/people/s215111/Exercises_from_class/Projects/group_06_project/data/_raw/genomes-all_metadata.tsv")) 
  print(
    untidy_data  |>
    dplyr::select(Lineage) |>
    dplyr::slice_head(n = 10))
# A tibble: 10 × 1
   Lineage                                                                      
   <chr>                                                                        
 1 d__Bacteria;p__Patescibacteria;c__Saccharimonadia;o__Saccharimonadales;f__Na…
 2 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Saccharofermentanales;f__Fastidi…
 3 d__Bacteria;p__Bacillota;c__Bacilli;o__Staphylococcales;f__Gemellaceae;g__Ge…
 4 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Saccharofermentanales;f__Fastidi…
 5 d__Bacteria;p__Patescibacteria;c__Saccharimonadia;o__Saccharimonadales;f__Na…
 6 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidacea…
 7 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Tissierellales;f__Peptoniphilace…
 8 d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g…
 9 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Saccharofermentanales;f__Fastidi…
10 d__Bacteria;p__Patescibacteria;c__Saccharimonadia;o__Saccharimonadales;f__Na…
print(readr::read_tsv(here("/net/pupil1/home/people/s215111/Exercises_from_class/Projects/group_06_project/data/02_dat_clean.tsv"))) 
# A tibble: 618 × 21
   Genome        Genome_type  Length N_contigs    N50 GC_content Completeness
   <chr>         <chr>         <dbl>     <dbl>  <dbl>      <dbl>        <dbl>
 1 MGYG000303700 MAG          678213         2 466332       47.8         63.7
 2 MGYG000303701 MAG         1500176        18 112881       42.4         87.8
 3 MGYG000303702 MAG         1210062        44  48790       26.4         94.8
 4 MGYG000303703 MAG         1706016        27  89653       44.6         93.7
 5 MGYG000303704 MAG          703182         7 111709       47.8         63.7
 6 MGYG000303705 MAG         2542045       112  34925       48           97.9
 7 MGYG000303706 MAG         1449687       185  10153       34.8         85.2
 8 MGYG000303707 MAG         1874692        90  28768       37.1         99.0
 9 MGYG000303708 MAG         1480380        12 169949       42.2         87.6
10 MGYG000303709 MAG          694644        57  15063       47.9         62.0
# ℹ 608 more rows
# ℹ 14 more variables: Contamination <dbl>, rRNA_5S <dbl>, rRNA_16S <dbl>,
#   rRNA_23S <dbl>, tRNAs <dbl>, Country <chr>, Continent <chr>, Domain <chr>,
#   Phylum <chr>, Class <chr>, Order <chr>, Family <chr>, Genus <chr>,
#   Species <chr>

Augmentation of the data

  • Yes/no inserted
[1] 4

Data analysis

Data description

  • Beskrivelse af data

  • plots

  • punkt 3

Data analysis: Completeness vs. contamination

Tester lige hvordan caption ser ud

Data analysis: Phylogenetic tree

Data analysis: Endometriosis association

Data analysis: Heatmap

Data analysis: PCA

Data analysis: p value

Discussion and conclusion

  • Are there any significant relationship between the vaginal microbiota and endometriosis?
  • Aim